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Abstract 

In this paper we give upper and lower bounds as well as a heuristic 
estimate on the number of vertices of the convex closure of the set 

Gn = {(Oi b) : a,b £ Z,ab = 1 (mod n), 1 < a,b < n — 1} . 

The heuristic is based on an asymptotic formula of Renyi and Sulanke. 
After describing two algorithms to determine the convex closure, we 



compare the numeric results with the heuristic estimate. The numeric 
results do not agree with the heuristic estimate — there are some inter- 
esting peculiarities for which we provide a heuristic explanation. We 
then describe some numerical work on the convex closure of the graph 
of random quadratic and cubic polynomials over Z„. In this case the 
numeric results are in much closer agreement with the heuristic, which 
strongly suggests that the the curve xy = 1 (mod n) is "atypical" . 

1 Introduction 

Let Gn be the set 

Gn = {{a, b) : a, 6 G Z, a6 = 1 (mod n), 1 < a,b < n — 1} , 

whose cardinality is given by the Euler function ^p{n). If we scale by a factor 
of 1/n we get the set of points n'^Gn, which is uniformly distributed in the 
unit square. More precisely, if fi C [0, 1]^ has piecewise smooth boundary and 
N{Q, n) is the cardinality of the intersection Q fl n~^Gn, then it is natural to 
expect, and in fact can be proved by using the bounds of Kloosterman sums, 

that 

N{n,n) 



r2| -^- — ^0 as n ^ oo, (1) 

where \Q\ is the area of Q. Figure 1, generated by Maple, illustrates this 
property. 
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Fig. 1. The graph G5001 

Quantitative forms of ([1]) have been given in a number of works, see [3l 
[T0| [25l [26l [27] and references therein. For example, it follows from more 
general results of [10] that for primes p, 



\n\ 



N{Q,p) 



p — 1 



0{p-^/Hogp) 



(2) 



where the implied constant depends only on Q. 

Here we continue to study some geometric properties of the set Gn and in 
particular concentrate on the convex closure Cn of G„. One of our questions 
of interest is the behavior of v{n) and V{N), where v{n) denotes the number 
of vertices of C„ and V{N) denotes the average. 



V{N) 



N 
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We demonstrate that the theoretic and algorithmic study of v{n) has 
surprising links with various areas of number theory, such as bounds of ex- 



ponential sums, distribution of divisors of "typical" integers and integer fac- 
torisation. On the other hand, we present heuristic estimates h{n) and H{N) 
for v{n) and ^(A^), respectively. These heuristic estimates arise by viewing 
Gn as a set of points that are randomly distributed and then using the result 
of Renyi and Sulanke [TTl Satz 1]. On comparing with our numeric results 
we see that although the heuristic prediction H{N) gives an adequate idea 
about the type of growth of V{N), there is a deviation which behaves quite 
regularly and thus probably reflects certain other hidden effects. We sug- 
gest some explanation. We also examine numerically some other interesting 
peculiarities in the behaviour of f (n) which lead us to several open questions. 

Finally, we present some numerical evidence suggesting that the above 
effects do not arise for sets of points on other curves which behave more 
like truly random sets of points, which makes the study of Gn even more 
interesting. 

We note that some other geometric properties of the points of Gn have 
recently been considered in [20]. A survey of recent results about the distri- 
bution of points of Gn and more general sets corresponding to congruences 
of the type ab = X (mod n) with some fixed A, are given in 



2 Some Preliminary Observations 

2.1 General structure of Cn 

We begin with a simple (but useful) remark on two lines of symmetry of Gn- 

Proposition 1. The points of Gn are symmetrically distributed about the 
lines y = X and x + y = n. 

Therefore, if (a, 6) G Gn, then its reflection in y = x, (6, a), and its 
reflection in x + y = n, {n — b,n — a), are elements of Gn- Consequently, 
(a, b) is a boundary point of C„, if and only if {b, a) ,{n — b,n — a) and 
{n — a,n — b) are boundary points of C„. 

Our next result shows that C„ is always a convex polygon with nonempty 
interior, except when n = 2, 3, 4, 6, 8, 12, 24. 

Proposition 2. |C„| = 0, if and only if n = 2, 3,4, 6, 8, 12 or 24. 

Proof. This follows by observing that for these moduli all of the elements in 
Z* (that is, all units of the residue ring modulo n) have order 2. Consequently, 
for these moduli all of the elements of Gn lie on the line y = x- D 



From now on we typically exclude the cases n = 2, 3, 4, 6, 8, 12 and 24. 

2.2 Points in the triangle 7^ 

By Proposition [1] we only need to know the vertices of C„ that lie in the 
triangle 7^ with vertices (0,0), (0,n) and (ra/2,ra/2), to determine C„. We 
denote the vertices of C„ that lie in the triangle 7^ by 

(ao, bo), (ai, h),..., (a^, bs) eCnHTn, 
where Oq < oi < • • • < fls- 
Proposition 3. We have the following: 

1. {ao,bo) = (1,1); 

2. ai < bi for i = 1, . . . ,s; 

3. bo <bi < ... <bs. 

4. bi~ai < bi+i - ttj+i fori = 0,...,s-l. 

Proof. Assertions 1 and 2 are clear. Assertions 3 and 4 follow from the 
following observation. The line through (oj, bi) and its symmetric counterpart 

(n — bi,n — ai) intersects the line x + y = n at the point ((n — fej + aj)/2, (n + 
bi — ai)/2). Since a^ < Oj+i and (cj+i, 6j+i) is a vertex of Cn, it follows that 
(aj+i,6j+i) must actually lie inside the smaller triangle with vertices {ai,bi), 
[ai, n - ai) and {{n - k + ai)/2, {n + bi - ai)/2). D 



2.3 On the difference bg — flg 

The inequalities in Proposition [3] may seem obvious, but they play a key role 
in our algorithms to compute the vertices of C„. The vertex (03,63) has an 
important property. Let M{n) denote the quantity 

M{n) = max{|a — 6|:l<a,6<n — 1 and ab = 1 (mod n)} . (3) 

An immediate consequence of Proposition [3] is that 

b,-a, = M{n). 



The quantity M{n) has been studied in [HI [T51 [IS]. It is shown in 
that 

n-M(n)<n=^/^+"«. (4) 

On the other hand, by P, Theorem 3.1], for almost all n 

n - M{n) > n^'"^ {lognf^ (log log n)^/^ /(n), 

where 

1 + log log 2 

5=1 ^ ^ ^ ^ 0.086071 . . . , 

log 2 

and f{x) is any positive function tending monotonically to zero as a; ^ oo. 
We recall that it has been proposed in [8], Conjecture 4.1] that the above 
bound is quite tight: 

Conjecture 4. For almost all n 

n — M{n) <^ n^''^ (log^) (log log n) ' g{n), 
where g{x) is any function tending monotonically to oo as x ^ oo. 

In support of Conjecture H] we make the following observation. For a fixed 
e > define the set 

J\f(e) = {neN : 3 d\{n - 1) such that n^^"^-' <d< n^/^}. 

By [111 Theorem 22] N'le) has positive asymptotic density. Since 

din — I = 1 (mod n), 

we see that 

n - Min) <n- in- ^^ d] = ^^—— + rf < n^/^+^ 

\ d J d 

for every n with this property. This immediately implies that for any e > 

n - M{n) < n^/2+^ 

for a set of n of positive density, which is a weaker form of what is assumed 
in Conjecture m In [8], one can also find more developed heuristic arguments 
supporting Conjecture HI 



We make one other remark about the vertex (a^j&s). Following [22], we 
introduce the quantities 

pi{m) = max d and P2("^) = min d. 

d\m,d<y/m d\m,d>y^ 

We note that 

Os = piikn — 1) and {n — hg) = P2{kn — 1), 
where k is the integer such that as{n — bg) = kn — 1. 

2.4 Heuristic 

Our heuristic attempt to approximate v{n) makes use of a probabilistic 
model. Specifically, to view the points of n~^Gn as being randomly dis- 
tributed in the unit square (which is supported by theoretic results of [SI [TOl 
1221 Sni EZ] ) and then appeal to a result of Renyi and Sulanke [171 Satz 1]. Let 
7?. be a convex polygon in the plane with r vertices and let Pj, i = 1, . . . , n, 
be n points chosen at random in TZ with uniform distribution. Let X„ be the 
number of sides of the convex closure of the points Pj, and let -E(X„) be the 
expectation of X„. Then 

E{X^) = ^r(logn + 7) + c^e + o(l), (5) 

where 7 = 0.577215 ... is the Euler constant, and c-ji depends on TZ and 
is maximal when 7^ is a regular r-gon or is affinely equivalent to a regular 
r-gon. In particular, for the unit square TZ = [0, 1]^ we have 

Q 

c-n = --log 2. 

More precise results are given by Buchta and Reitzner ^, but they do not 
affect our arguments. 

Using Q with r = 4, it is plausible to conjecture that for most n 

v{n) ^ h{n)^ (6) 

where 

h{n) = -{log ip{n) + 7 - log 2). 

A portion of our work has been to generate numerical data to test this con- 
jecture. 



3 Bounds on v{n) 

3.1 Lower Bounds 

Here we give a lower bound on v{n) in terms of the number of divisors 
function T{n). We begin by establishing some notation and making a couple 
of pertinent observations. 

For a fixed n, let us consider the curves aj{n) and Pj{n) defined by 

<yj(n) : x(n — y) = jn — 1, l<x<y<n — 1, 

I3j{n) : y{n — x) = jn — 1, l<y<x<n~l. 

A key observation used repeatedly is that for each point of Gn there is a j in 
the range 1, . . . , In/ 4] such that the point lies on the curve aj{n) or j3j{n). 
We denote the region bounded by the curves aiiji) and /?i(n) by IZn- The 
next figure is an illustrative example. We note that the outermost curves are 
ai(41),/3i(41). 
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Fig. 2. The graph G^ and the curves aj(41),/5j(41), j = 1,2,3,4 



For an integer s > 1 we denote 

i (sj = max — — 

j=1,.--,t(s)— 1 Uj 

where \ = d\ < . . . < (i^(s) = s are the positive divisors of s. 
Clearly, 

T{s) < P{s), (7) 

where P{s) denotes the largest prime divisor of s. 

Let Dn be the convex closure of the points {di,n— {n — l)/di), {n — {n — 
l)/(ij, (ij), for i = 1, . . . , T{n — 1). Clearly, we have the inclusions Dn ^Cn^ 
TZn- We remark that if n — 1 is prime, the set Dn is simply the line segment 
connecting the points (1, 1) and (n — 1, n — 1). 

The purpose of our next proposition is to give a criterion to determine 
which of the aj(n), 2 < j < In/A], lie strictly in the interior of -D„, and hence 
strictly in the interior of C„. We denote by r„ the set of boundary points 
{x, y) of Dn such that y > x, that is, F^ = {(x, y) : (x, y) G dDn, y > x}. 

Proposition 5. Let 1 = di < . . . < (ir(n-i) = n — 1 be the positive divisors 
of n — 1. Then, for any integer m > 2, 

^ / N rt di+i di , A(m — 1) , , 

F„na„(n)=0 ^ ^+^<4m-2+^ -^, i = 1, . . . , r(n-l)-l. 

di aj_|_i n — L 

Proof. This is a routine computation and so we only sketch an outline. The 
polygonal curve F„ is the union of line segments 

Li-. {l-t){d,,n-{n-l)/di)+t{d,+i,n-{n-l)/di+i), < t < 1, 

with i = l,...,(r(r2— 1) — 1). Now LiHamin) = if and only if the quadratic 
equation in t 

[di+i-di) — — t -(di+i-dj) — — t+{l-m)n = 

V di+i di J \ di+i di J 

has no real solutions. D 

A useful consequence of Proposition [5] is that if 

T{n - 1) + 3 



m > 

4 

with rn, G Z and m > 2, then F„ fl am{n) 

9 



(8) 



Theorem 6. For all n>2, 

v{n) > 2(r(n-l)-l), 
and for sufficiently large x, 

#{n<a; : t;(n) = 2 (r(n - 1) - 1)} > ^ 



logx 



Proof. Since C„ C 7^„, any {x,y) G (?„ fl (Q;i(n) U Pi{n)) is a vertex of C„, 
and either x or y is a divisor of {n — 1). Therefore, v{n) > 2 (r(n — 1) — 1). 
By (IH]) we have r„ fl a2{n) = for every n with T{n — 1) < 5. Conse- 
quently, for such n, aU of the vertices of C„ he on ai{n) U /3i{n) and thus 
v{n) = 2 ('r(n — 1) — 1). On the other hand, by [TBI Theorem 1], we know 
that for any fixed t and sufficiently large x, 

^/ N -, xlogt 

logx 

Applying this result with t = 5 we conclude the proof. D 

It is easy to construct explicit examples oin with v{n) = 2 (r(n — 1) — 1). 
For instance it follows from ([7]) and ([H]) that this holds for n = 2''3''5* + 1, 
where r, s, t are non-negative integers. 

Since for any 6 > we have 

limsupr(A;)2-(^-'')'°s^/^°s'°s^ = cx) 

fc— >oo 

(see [T2l Theorem 317]), the same holds true for v{n), and so we can infer 
that the heuristic estimate (jH]) is sometimes exponentially smaller than v{n). 

Corollary 7. For any 5 > 

limsupt;(n)2-3/8(i-^)Mn)/iog/.W = ^o. 

n—*oo 

We have that v{n) > 2(r(r2 — 1) — 1), and it is natural to ask when does 
one have strict inequality. Our next result gives a partial answer to this 
question. Specifically, we exhibit a set of positive density for which we have 
strict inequality. Furthermore, if we assume Conjecture H] then we have strict 
inequality for almost all n. 



10 



Theorem 8. The strict inequality 

f(n) > 2(r(n - 1) - 1) 
holds 

i. for a set of n of positive density, 
a. for almost all n, provided that for almost all n we have n — M{n) < 

„l/2+o(l) _ 

Proof, i. Let 

S{x) = {n<x : v{n) = 2(r(n - 1) - 1)}, 

and 

X(x) = {n < X : as {n — bs) = n — 1} . 

It is important to note that the values of s, a^ and bg aU depend on n. We 
remind the reader of the following properties of the point {as, bs) used in the 
proof below. It is the highest vertex of C„ that lies on or below the line 
X + y = n; M{n) = bg — ag and a^ < n — bs- Clearly, S{x) C X(x). 
The set of positive density we have in mind is 

A{x) = {n < X : 3p prime with p\(n — 1) and p > x^'"^^}. 

Using Mertens's formula, (see [T2| Theorem 427]), we get that 

X — 1 



#A{x)= Y. 



P 



(log(25/19))x. 



Since £{x) C X(x), in order to prove 



^.^ #(^(x)ng(x)) ^ Q 



x^co X 

it is enough to prove that 



j.^ #M(a:)nJ(a:)) ^ ^ 



X 
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We now write X(x) as the disjoint union of the two sets Xi(x), X2(x), 
where 

Xi(x) = {nel{x) : n-6, <x°-24}, 

l2{x) = {nel{x) : x°-='^<n-6, <a;°-^^}. 

The exponent values, 0.24 and 0.76, come from the asymptotic n — M[n) < 
^3/4+0(1) ^j^g^^ ^g mentioned earher. Since #Xi(x) < x^'^^ and for x large 
^(x) 0X2(0;) = 0, it follows that for large x 

#(^(a;) nX(x)) = #(^(a:) nXi(x)) + i^{A{x) n ^x)) < #Xi(a;) = o{x). 

a. We now prove the following conditional statement. If for almost all n, 
n — M(n) < n^/'^g{n) with some function g{n) = n°^^\ then #X(x) = o(x). 

Without loss of generality we may assume that g{n) is monotonically 
increasing. This time we write X(x) as the disjoint union of three sets, 
i7i(x), i72(x) and J^s^x) where 



i7i(a^) = {nEl{x) : n — bs < 

J2ix) = <n e I(x) : —-— < n -bs < \/xg(x) 
I 9{x) 

Jsix) = {nel{x) : y/xg{x) < n - bs < x^-"^^} . 

Now #i7i(x) < xgi^x)""^ = o(x), and by our assumption we also have 
#i73(a;) = o(x). So to conclude we need to show that ij^j2{x) = o(x). This 
follows by the following observation. Let 

H{x, y, z) = {n < X : 3d\n with y < d < z}. 

Then 

#J^2(a;) < H (x, ^/x/g{x), y/xg{x)) , 

and by [71 Theorem 1], 

iJ (x, y/x/g{x), \/xg{x)) = o{x) 
which concludes the proof. D 
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We remark that the assumption of Theorem [H] (ii) is weaker than Con- 
jecture m The bound of Conjecture H] probably holds for almost all primes. 
This would then imply that 

v{p)>2{T{p-l)-l) 

for almost all primes p. On the other hand, it is reasonable to expect that 
there are infinitely many primes of the form n = 2''3''5* + 1 (in fact even of 
the form p = 3 ■ 2*^ + 1), and therefore equality would occur infinitely often, 
as well. We conclude this section by proving that v{n) can be substantially 
larger than T{n — 1). 

Theorem 9. There is an infinite sequence of integers rij with 

//21og2 \ logn,- \ 

virij) > exp —- h o(l) - — - — '— and riuj - 1) = 2. 

VV 11 J loglogrij-y 

Proof. Let n be a shifted prime, that is, n = p + 1, where p is prime. We 
first show that for such integers, 

v{n) = v{p + 1) > 2(r(2p + 1) - 3). 

Let i be the line through (1, 1) which is tangent to a2{n). Since (1, 1) and 
{p,p) are the only points of G„ on ai(n), all of the points of Gn lie on or 
below £. A straightforward calculation shows that £ meets a2{n) at the point 
(x, y) where the x-coordinate is 

"" ^ l-((p + l)/(2p+l))i/2 < ^- 

Hence every divisor d oi 2p + 1, with 3 < d < {2p + l)/3, gives rise to a 
vertex on 0:2 (^). Consequently the number of vertices on 0:2 (ri) is at least 
r(2p + 1) — 4. By symmetry there are an equal number of vertices on /32{n), 
and since (1,1) and {p,p) are also vertices of C„, we obtain the desired 
inequality. 

We now let Qj denote the product of first j odd primes and set pj to be 
the smallest prime satisfying the congruence 2pj = — 1 (mod Qj). By the 
Prime Number Theorem log Qj ~ j logj, and by Heath-Brown's [13J version 

11 /2 

of Linnik's theorem we have pj < cQ- , for an absolute constant c > 1. On 

11/2 

combining pj < cQj with the asymptotic log Qj ~ j logj we obtain 

13 



Setting rij = pj + 1 we conclude the proof. D 

In particular, we see from Theorem [9] that 

logf(n) 
limsup = oo. 

n^oo logr(n-l) 

Furthermore we can replace the terms logv{n) and log r(r2 — 1) by the fc-fold 
iteration of the logarithm for any A; G N. Unfortunately, we do not see any 
approaches to the following. 

Conjecture 10. We have 

liminf f(n) = cxd. 

n— >oo 

3.2 Upper Bounds 

Theorem 11. For n -^ oo, 



Proof. In Section 12.21 we labelled the highest vertex of C„ in the triangle 7^ 
by (os, bs). Trivially, s < Qg and Qg < n — hg- Hence 

v{n) <As + 2<4:as + 2<2{n-hs + as + l) = 2{n - M{n) + 1), 

and the bound (jl]) concludes the proof. D 

Most certainly the bound of Theorem [11] is not tight. If we assume 
Conjecture m then 

for almost all n. This still seems too high and the actual order of v{n) is 
almost certainly much smaller. A different upper bound for v{n) can be 
derived from ([8]). For integers n where n — 1 has only small prime factors, 
this upper bound is significantly better than Theorem [TTl 

Theorem 12. For n ^ oo, 

v{n) <T(n-l)n°W. 
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Proof. From ([H]) we see that only points from the curves aj{n) and (3j{n) 
where, 

T(n-l) + 3 



j < mn 



4 



contribute to v{n). Since every curve aj{n), Pj{n) contains at most T{jn — 1) 
points of Gn we derive 

rrin 

t^W<$^2r(jn-l). 
i=i 

We conclude by invoking the asymptotic inequality r(r) <^ r°^^\ see ^T2\ 
Theorem 315]. D 

4 Computing Cn 

4.1 Systematic search algorithm 

We now describe a deterministic algorithm to construct the vertices of C„ that 
lie in the triangle 7^. It is a variant of the famous algorithm of Graham [9] 
known as GRAHAM ScAN. The main virtue of our algorithm, as opposed 
to using some other convex closure algorithms, is that we do not need to 
generate and store all of the points of Gn before determining the convex 
closure. Instead, we generate the points one by one, discard most of them 
along the way, and halt in a reasonable amount of time. 

Algorithm 13. 1. Set oq := 1; &o := 1- 

2. Fori = 0,1,...: 

(a) Set ttj+i := to be the smallest integer a E Z"^ satisfying the in- 
equalities 

n + ai-bi ^1 

tti < a < ana bi — ai < a —a. 

If either of the above conditions cannot be met the algorithm ter- 
minates. 

(b) Set 6j+i := a~^ . 

15 



(c) Convexity check: 

i. If i = 1 goto Step 2(a). 

a. If i > 2 and the angle between the points {ai-i,bi^i) ,{ai,bi) 
and (aj+i,6j+i) is reflex then return to Step 2(a), otherwise 
discard the point (a^, 6j) and set 

Oj := Oj+i, 6j:=6j+i, i ■= i - I 

and return to Step 2(c). 

We note that the inequahties in Step 2a are motivated by Proposition [3J 
Clearly, Algorithm [T3] is deterministic and it immediately follows from (jlj) 
that its complexity is O^rt"'^^"^^'). 

4.2 Factorisation based algorithm 

The observation that the points in G,„ fl ai{n) are vertices of Cn combined 
with ([H]) allows us to devise a variation on Algorithm [T31 The idea is to first 
use factorisation to create a smaller input set and then run the algorithm. 
Let Vn be the polygonal region with vertices 

(1, n - 1), (1, 1), ((ii, n-{n- l)/di) ,..., (4, n - (n - l)/4) , 



(((n - l)/4 + 4) /2, n-{{n- l)/4 + 4) /2) , {VrT^^, n - V^i^l), 

where 1 = d^ < di < . . . < dk are the factors of n — 1 which are less than or 

equal to ^/n — 1. Since the vertices of C„ can only lie on the curves oijin), 

j3j{n) where 

T(n-l) + 3 
J <mn= ^ 

we need only determine which of the points of the union 

i=i 

are vertices of C„, where Sj^n = Cij{n) fl (?„ fl P„. It is useful to keep in mind 
that 

run rrin 

#C/n < Yl #'5'j.n < Yl ^(^'^ - 1) = "^«^°^'^' 

see [T^ Theorem 315]. We now apply the following algorithm. 

16 



Algorithm 14. 

1. Factorization: 

(a) Find all of the factors 1 = do < di < . . . < d/^ < \Jn — 1 of n — 1. 

(h) Set Si := {(1, 1), (rfi, n-{n- l)/di) ,..., (4, n - (n - l)/4)}. 

(c) Compute t := T{n — 1). 

(d) Setmn:= [(^ + 3)/4j . 

(e) For j = 2, . . . , rrin, factor jn — 1 and construct the set Sj^n- 

(f) Set Un := UY^iS,,n- 

2. Determining the vertices: 

(a) Order the points of Un by increasing first co-ordinate. 

(b) Apply the appropriate versions of Steps 2a and 2c of Algorithm [7g| 
to the elements ofUn- 

The complexity of Algorithm [T3] depends on the type of algorithm we 
use for the factorisation step. If we use any subexponential probabilistic 
factorisation algorithm which runs in time n°^^\ (see |H Chapter 6]), then 
the complexity of Step [1] of Algorithm [H] is at most 

Furthermore, the complexity of Step [2] of Algorithm [H] is of the same form 
as well. So the overall complexity of Algorithm [H] is at most 

m„n°W =T(n-l)r2°W. 

This is lower than that of Algorithm [13] if T(n — 1) < n^/^. For any fixed 
A > the proportion of the positive integers k with T{k) < k^ is given by 
a certain continuous function ipi^X) > 0, see j23j. Using [TSl Corollary A] we 
conclude that 

^(3/4) = r' p(--l) — = rp{y)^ = 0.866468 . . . 
Jo \x J X Ji/7 1 + y 

where p{u) is the Dickman function, see |5] or [211 Section III. 5. 4]. Thus 
the proportion of the positive integers n with T{n — 1) < n^^^ is ■?/'(3/4) = 
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0.866468 . . .. (The bound in Step[Td]of Algorithm [T^ is certainly not tight. It 
can probably be replaced by a bound of order n°^^^ or even possibly a power 
of logra, but unfortunately we have not been able to prove such a result.) 

On the other hand, if we use a deterministic factoring algorithm in Step 1, 
then Algorithm [T3] is of complexity at most 

m„(m„n)i/^+°(i) = T{n - l)5/4^V4+o(i) 
unconditionally, and of complexity at most 

m„(m„n)i/5+°(i) = T{n - l)6%V5+o(i) 

under the Extended Riemann Hypothesis, see [H Section 6.3]. Accordingly, 
this is better than Algorithm [T3] for T{n — 1) < in?/^ and T{n — 1) < n^^/^"' 
respectively. The corresponding proportions of the positive integers, n, sat- 
isfying these inequalities are 1^(2/5) and ^(11/24). Since [TSl Corollary A] 
expresses both ip{2/5) and '?/'(ll/24) as double integrals, it is easier to com- 
pute 'i/'(3/4) than either of these two values. 

5 Computational Results 

5.1 Expected value of V{N) 

Let 

^^ylog(i^iM = _o.580058..., 
p 
where the sum runs over all prime numbers p. Surprisingly enough, this quan- 
tity has already appeared in various, seemingly unrelated number theoretic 
questions, see [HI page 122]. 

Proposition 15. We have, 

i^log^H = logiV + „-l + o(!«). 

n=l ^ ^ 

Proof. Obviously, 

1 ^ 1 ^ 1 ^ 

- J2 log fi^) = j:^J2^''sn+-J2Yl i°g(i - Vp), 

n=l n=l n=l p\n 



where the last sum is taken over prime divisors p\n. The first sum on the 
right-hand side is logiV — 1 + o(l) by Stirling's formula. By changing the 
order of summation in the second sum, we derive 

1 ^ 1 

n=l p\n p<N n<N 

p\n 



^EMi-i/p)(^ + o(i)) 



p<N 

^ log(l - l/p) ^ ^ / 1 ^ 1 

p<N ^ \ p<N ^ , 

Y^ log(l - l/p) ^ ^ / log log A^ 

p<N ^ ^ 

where the last step follows by Mertens's formula, see [T2l Theorem 427]. 
Observing that 

.^ iog(i - l/p) _ v Mi-i/p) _„, ^fi 

p<N ^ p>N ^ ^ 

we conclude our proof. D 

Combining heuristic ([6]) with Proposition [15] for the average ^(A^), we 
get the heuristic ^(A^) ~ H{N), where 

H{N) = -{\ogN + -f + r]- l-log2) ^ 2.66666 • log A^ - 4.52264. 
o 

In Figure 3 we compare the graph of V^(A^), H{N) and the least squares 
approximation 

L(A^) = 3.551166 -log A^- 9.610899 (9) 

to ^(A^), where A^ ranges over the interval [2, 5770001]. The values of V{N) 
are represented by diamonds along the graph of L{N), while H{N) is the 
lower curve. 
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Fig. 3. V{N), H{N), and L{N) ior 2 < N < 5770001 

We see that although V^(A^) behaves hke a logarithmic function and thus 
resembles H{N), they clearly deviate. This deviation seems to be of regular 
nature and suggests that there should be a natural explanation for this be- 
haviour of V^(A^). In an attempt to understand this we computed v{n), h{n) 
and T{n — 1) for 50000 random integers in the interval [10^,10^], and did 
some comparisons. We present the individual data in the histograms in Fig- 
ures 4 and 5, and the comparisons in Figures 6, 7, 8, 9 and 11. In several 
histograms the extreme values on the right are not visible. Hence, for visual 
clarity we have truncated them on the right. Under each histogram we state 
in the caption the minimum value, the maximum value and the number of 
values that are not shown. 



20 



6000-- 

5000- 

4000- 

3000- 

2000- 

1000- 



Histogram of v 



Jl 



m 



rinnn^rir 



rIL 



l^^^n^Q. 4> 4= fofe /\^ <^ cf ^^^A}-A'A'.^^^^A^^^a\<3' 



\^ V N" N^ \' sT S? SP V V SP ■v' f 



Fig. 4. Frequency histogram of v{n) 
min = 14, max = 766 (645 values omitted) 
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Fig. 5. Frequency histogram of h{n) 
min = 33.01, max = 48.81 
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Histogram of v-h 
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Fig. 6. Frequency histogram of {v — h) 
min = —29.93, max = 714.41 (458 values omitted) 
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Fig. 7. Frequency tiistogram of 2{T{n — 1) — 1) — h{n) 
min = —44.96, max = 714.41 (443 values omitted) 
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Histogram of (v-h)/h 

3-Parameter Lognormal Fit 
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Fig. 8. Frequency histogram of {v — h)/h witli a lognormal fit 
min = —0.68, max = 14.77 (170 values omitted) 
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9. Frequency histogram of {v 
min = —0.68, max = 14.77 



- h)/h with a loglogistic fit 
(170 values omitted) 

The histogram in Figures 6, 8, and 9 provides evidence that for most 
values of n, h{n) is a good approximation to v{n). This leads to the main 
peak. After comparing the histograms in Figures 6 and 7, it is plausible to 
speculate that some of the secondary peaks of {v{n) — h{n)) to the right of 
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correspond to large values of T{n — 1) that are quite "popular". It would be 
very interesting to find (at least heuristically) a right model which describes 
these secondary peaks (their height, frequency and so on). 

Let X be a random variable. We say that X is lognormally distributed 
if log X is a normal distribution, and X is loglogistically distributed if log X 
is a logistic distribution. The probability density functions of the lognormal 

distribution is 

exp(-(logx-/i)V(2(T2)) 

f{x; ^, a) = -== , 

y/2rrax 

where /j, and o"^ are the mean and variance of log(X). The probability density 
function of the loglogistic distribution is 

/(x;/i,a) = exp((logx-/i)/a) 

cra;(l + exp((logx — /i)/cr))^' 

where fi is the scale parameter and a is the shape parameter. 

In Figures 8 and 9 we have provided the scaled histograms of {v — h)/h 
with the lognormal fit and the loglogisitic fit respectively, as both of them 
seem to be reasonable approximations. Numerically, the loglogistic fit seems 
to be better. However here is a heuristic argument (articulated by one of 
the referees) suggesting that the lognormal is more accurate. By the Erdos- 
Kac theorem [2^ III. 4. 4, Theorem 8], a;(s) is normally distributed, and since 
r(s) = 2'^*^*)+'^(^) for most integers s, we conclude that logr(s) is also nor- 
mally distributed. Given the connection between v{n) and the divisor func- 
tions, it seems reasonable to believe that a lognormal distribution is more 
accurate. 

As a curiosity, we also mention that in the highly asymmetric histograms 
of Figures 6, 8 and 9 we still have v{n) < h{n) in 25057 out of 50000 cases. It 
would be interesting to understand whether this is a coincidence, or whether 
there is some regular effect behind this. 

Our heuristic explanation for the difference between V{N) and H{N) is as 
follows. Overall, (?„ behaves as a "pseudorandom" set, but (as we observed 
in Theorem [H]) there are some "regular points" on the convex closure arising 
from the divisors of n — 1. For a typical integer n, these points have little 
effect, but for exceptional values of n, they make a substantial contribution 
to the value of v{n) which is sufficient to interfere with the "pseudorandom" 
behavior of Gn- To see this, it is useful to recall that although for most 
integers we have 

T{n - 1) = (logr2)i°s2+o(i) _ ^^^yog2+o{i)^ 
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see [T^ Theorem 432], on the average we have 

^ 3/V 

Vr(n -1) r^ NhgN H{N), 

^-^ 8 

n=2 

see [T2l Theorem 320]. Therefore, the contribution of 2r(?7, — 1) from the 
points on the curves ai{n) and Pi{n) (see Theorem [6]) is neghgible compared 
to h{n) for almost all n, but on average are of the same order as 0.75H{N). 
Thus it is plausible to assert that the values of H{N) reflect only the "pseu- 
dorandom" nature of G„, whereas the contribution of 2r(?2 — 1) from the 
curves ai(n),/5i(n) reflect certain "regular" properties of the points of (?„. 



5.2 Weighted average contribution of divisors 

The lower bound of Theorem [6] takes into account only the contribution from 
the divisors of n — 1. It is plausible to assume that the divisors of jn — 1, with 
"small" j > 2, also give some regular contribution to v{n). This probably 
requires some completely new arguments since the contribution from such 
divisors is certainly not additive. 

Experimenting with some weighted averages involving r(jn— 1) for "small" 
values of J, we have found that gi{n) and g2{n) where 

[log n\ 

giin) = 2(r(n-l)-l) + 2 5^ j-3/V(jn-l), 

i=2 

[log n\ 

g2{n) = 2(r(n-l)-l) + 2e ^ e-Mjn-1), 

i=2 

to be "reasonable" numerical approximations to v{n). 

It is too early to make any substantiated conjecture about the true con- 
tribution from the divisors of jn — 1 with j > 2. Numerical experiments for 
a much broader range as well as some new ideas are needed. Nevertheless, 
our calculation raises the following question. 

Question 16. Are there "natural" coefficients Cj, j = 2,3 . . ., and function 
J{n), such that if we define g{n) to he 

J(n) 

g{n) = 2r(n -l) + Y^ CjT{jn - 1), 
i=2 
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then we have 



as N ^ oo? 



1 ^ 

n=2 



n 



Clearly, if V{N) ~ ClogA^, then the answer to Question [T6] is positive, 
and one could then set J{n) = 2 and determine the value of C2 by "reverse 
engineering" . However we are asking for coefficients Cj and a function J{n) 
that can be explained by some intrinsic reasons, provided such reasons exist! 



5.3 The difference v(n) — 2(r(n — 1) — 1 



Another computer experiment that we ran on our random set of 50000 in- 
tegers was to check the values of the difference v{n) — 2(r(n — 1) — 1). The 
histogram of our experiment is given in Figure 10. 
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10. Frequency histogram of v{n) — 2(r(n — 1^ 
min = 0, max = 484 (199 values omitted) 

The graph of Figure 10 suggests that the most "popular" value of v{n) — 
2(r(n — 1) — 1) is 0. There is some obvious regularity in the distribution of 
other values which would be interesting to explain. 

The way we have derived the lower bound of Theorem [6] on the frequency 
of the occurrence v{n) = 2 (r(r2 — 1) — 1) from (jHj) raises the following ques- 
tion: 
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Question 17. Is T{n - I] 

v{n) = 2(r(n- 1)- 1)? 



0(1) for all (or nearly all) integers n with 



An affirmative answer to this question would then allow us to conclude 



that 



ij^ {n < X 



v[n] 



2{T{n-V. 



1)} 



X 



log a; 



In our random set of 50000 integers we have 10764 integers satisfying 
the equality v{n) = 2(r(?2 — 1) — 1). For this set of 10764 integers we have 
computed the value of t{n), where t(n) = [(T(n — 1) -l- 3)/4j. We give this 
histogram in Figure 11. We remark that for 7198 integers of this sample the 
value of t{n) is 1, and for 2413 integers of this sample the value of t{n) is 2. 
Thus for at least 9611 integers out of 10764 cases, we have r„ fl 0^2 (^) = 0. 
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Fig. 11. Frequency histogram of t{n) = \_(T{n — 1) -|- 3)/4j 
min = 1, max = 26 (39 values omitted) 

We have also found on examining the data that v{n) — 2(r(?2 — 1) 
invariably a multiple of 4 and this suggests the following conjecture. 



1) is 



Conjecture 18. For almost all n, 

2(r(n - 



v[n] 



(mod 4). 



We have a simple heuristic argument for this conjecture. We know that 
riji — 1) is odd if and only if {n — 1) is a square. Thus the conjecture reduces 
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to the statement that for almost all n, 4 /(v{n). On invoking Propositions [T] 
and E] we have that 4\v{n) if and only if the vertex {as,bs) lies on the line 
x + y = n. Intuitively this seems to be a very rare occurrence (unfortunately 
at present we are unable to put this key remark in a rigorous context); we 
typically see that as + bg = n only when n is the shifted square m? + 1. 

6 Other Curves 

Studying the point sets 

F„(/) = {{a,b) : a,be Z, f{a, b) = (mod n), 0<a,b<n-l}, 

where f{X,Y) G Z,[X,Y], is certainly a natural question, and this has been 
done in a number of works, see [3l [TOl [25l [27] and references therein. In the 
case of prime modulus p, one can use the Bombieri pQ bound of exponential 
sums along a curve as a substitute of the bound of Kloosterman sums. In 
particular, for a prime n = p, under some mild assumptions on the poly- 
nomial /, one can easily obtain an analogue of Theorem [TT] for sets Fp{f). 
However, our other results are specific to the sets Gn and cannot be extended 
to other curves. It is worth remarking that for composite n, there are some 
analogues of the Bombieri bound, see [21], but quite naturally, they are much 
weaker than the bound of [I] . So the Kloosterman sums is one of very few 
examples where the strength of the bound remains almost unaffected by the 
arithmetic structure of the modulus. 

Our preliminary tests show that the sets Fn{f) and Fp{f) have less "in- 
frastructure" than Gn and behave more like truly random sets. For example, 
let Wf{n) denote the number of vertices of convex hull of Fn{f). We now let 

/i^(n) = ^(log(#F„(/))+7-log2). 

The histograms in Figures 12-14 show the relative difference {wf — hf)/hf 
for random quadratic and cubic polynomials. For the histogram of Figure 12 
we chose a random value of n in the interval [10000, 300000]. Then based on 
the value of n we randomly chose the coefficients a, b, c and took f{x, y) to 
be the polynomial 

/(x, y) = y — ax"^ — bx — c. 

We did this for 10000 values of n. For the histogram of Figure 13 we repeated 
this same experiment with random quadratic polynomials for 1000 random 
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primes in the interval [7919,611953]. For the histogram of Figure 14 we 
repeated our first numerical experiment (again for 10000 values of n), but 
this time with random cubics 

f{x,y) = y — ax^ — hx^ — ex — d. 
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Fig. 12. Frequency histogram of {wj — hf)/hf for random quadratics / over random n 
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Fig. 14. Frequency histogram of {wf — hf)/hf for random cubics / over random n 
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The histograms of Figures 12-14 suggest that the quantities 



Wf{n) - hf{ 



n] 



and 



Wf{p) - hf{p) 



hf{n) hf{p) 

are both normally distributed with mean 0, and so we make the following 
"Erdos-Kac" type conjectures. 
Let 



<^Jz) 



exp I ^ 

2'Ka J-r^ V 2^2 



dt. 



denote the cumulative distribution function of a normal distribution with 
mean and variance o"^. 

Conjecture 19. For each integer n > 1 we choose a sequence T = (fn) of 
polynomials fn{x,y) e Zn[x,y] of a fixed degree d > 2, chosen uniformly at 
random over the residue ring Z„ and let 



aAN) 
PAN) 



n<N 
^ ' p<N 
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Then for any real z, 

i^{n<N : {wfSn)-hfSn))/hfSn)<z} ^ ^ 

#{p<iV : {wM-hf^{p))/hfM<z] ^ ^ 
with 'probability 1 fofer the choice of J^ = (fn)) as N —^ oo. 
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